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We describe a test statistic for unbinned goodness-of-fit of data in one dimension. The statistic is based on 
the two-dimensional Random Walk. The rejection power of this test is explored both for simple and compound 
hypotheses and, for the examples explored, it is found to be comparable to that for the test. We discuss 
briefly how it may be possible to extend this test to multi-dimensional data. 



1. Introduction 

This search for an unbinned goodness-of-fit test 
has been motivated by the widespread use of un- 
binned maximum hkehhood fitting for determining 
CP- violating parameters at Belle. While there are 
many cross-checks to insure that there are no spurious 
signals and biases, the fits tend to be complicated and 
not very transparent. They often involve probability 
density functions (PDF's) that differ with every event, 
based on measured quantities that add dimensions to 
the data that are not explicit in the fits. As there is 
no widely accepted unbinned goodness-of-fit test that 
applies to such fits, testing for statistical consistency 
of results has been uneven. The tests that have been 
done, resorting to binned or toy Monte Carlo, have 
their place but have not been entirely satisfactory in 
addressing the question. 

A common technique of unbinned tests involves first 
transforming the measured quantities to a variable in 
which the null hypothesis has a uniform distribution, 
where the PDF is flat, and then to test this "flattened" 
distribution for consistency with uniformity. There 
exists a variety of tests for uniformity, but most are 
not readily extended to multidimensional data, and 
they do not address compound hypotheses. A review 
of methods is given in [1| . 

In this report, we explore a test statistic that is 
based on the two-dimensional Random Walk. To be- 
gin, its distribution in the case of a flat PDF is dis- 
cussed. The ensemble distribution is then found for 
several alternate hypotheses, and the rejection power 
is calculated for comparison with other goodness-of- 
fit tests. As the aim of a goodness-of-fit test as it 
would be applied at Belle is to test the validity of the 
parametrization used in fitting, it is also important to 
examine how the test is modified under compound hy- 
potheses. The discussion is thus expanded to include 
data which are fitted to determine one or more param- 
eters. Finally, we discuss the possibility of extending 



to multidimensional data. 



2. Random Walk as a Test of Flatness 

A dataset consisting of A'' measurements of the one- 
dimensional quantity x lying in the interval [0, 1] may 
be mapped trivially to points on a unit circle with po- 
lar angle on the interval [0, 27r], so that each point 
is considered to be a unit vector with direction de- 
fined by (\). If the PDF in x is flat, the vector sum 
of the corresponding unit vectors in two dimensions 
corresponds to the net displacement, D, after a two- 
dimensional Random Walk of iV steps with unit step 
size. For sufficiently large N , this distribution con- 
verges to a well-known form (Rayleigh, 1888) and the 
distribution in is an exponential decay with mean 
equal to iV . We take D"^ /N as the test statistic. A 
deviation of the root distribution from the hypothe- 
sis will result in a bias of the ensemble distribution of 
this test statistic away from the origin. This statistic 
is mathematically equivalent to the first order term 
in the Fourier series that describes the distribution of 
the data: 



Hk = l) = / d<pY^ 

Jo ,=1 
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where one can see that oc One would ex- 

pect this distribution to be most sensitive to an overall 
imbalance of the PDF in generally opposite (j) direc- 
tions. To obtain sensitivity to higher order differences, 
one could thus take successively higher order terms in 
the series, for fc = 2,.... In practice it may not be 
useful to examine terms above fc = 3. In this study 
we look at fc = 1 (d = 1) and define Kk = ^7^^- 
What we have defined as Ki appears in the review of 
D'Agostino and Stephens as R in the context of the 
Von Mises test, a test for uniformity on a circle. 
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3. Flat PDF 

As mentioned above, the Ki distribution for a flat 
PDF converges rapidly to an exponential with a de- 
cay constant of unity. Figure ^ (top row) shows the 
distributions in Ki for ensembles of randomly gen- 
erated experiments containing N — 10, 100, and 



1000 events. Each of the three distributions is fit- 
ted via binned maximum likelihood to an exponential 
form. The fitted inverse decay constants ("slopes") 
are 0.992 ± 0.010, 1.008 ± 0.033, and 1.039 ± 0.049, 
respectively, in excellent agreement with the expecta- 
tion. 



J 
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Figure 1: (top row) Distributions in K\ for flat PDF: experiments with = 10, A'' = 100, and A'' = 1000, shown with 
fits to an exponential form, (bottom row) Distributions in Ki for PDF with the form 0.3 + lAX with A'^ = 10, 
N = 100, and A^ = 1000. 



To evaluate rejection power, these distributions may 
be compared with those obtained for PDF's that are 
not flat. The alternative hypotheses used in a study 
by Asian and Zech provide a convenient range of 
function types and allow for a direct comparson with 
the range of tests reviewed in their work. In that 
paper the rejection power of the alternative hypothesis 
was defined as one minus the probability for an error 
of the second kind, given a criterion that yields a 5% 
significance for the null hypothesis. Since in this case 
the null hypothesis gives an exponential distribution 
with unit decay constant, the 5% criterion is Ki > 3.0. 
Ensembles of experiments were generated for each of 
three functions used in Ref. 

AiiX) = 0.3 + 1.4A: (2) 
A2{X) = 0.7 + 0.3[n2e-64(^-°-^)'] (3) 



AsiX) = 0.8 + 0.2[7i3e-256(x-o.5)2] 

where the rii are normalization constants for the as- 
sociated Gaussians. All functions are defined in the 
interval [0,1]. The resulting Ki distributions for Ai 
are shown in Figure ^ (bottom row) . The values for 
rejection power are summarized in Tabled For com- 
parison, the values for the method {N = 100) given 
by Ref. are approximately 0.81, 0.85, and 0.81, re- 
spectively, so our method is comparable in power, at 
least in the case of these three functions. 

In order to apply this method as a goodness-of-fit 
test for non-uniform null hypotheses the PDF, f{X), 
must first be transformed to a "flat" variable, Y, 
where the probability distribution is flat. To form 
a uniform null hypothesis on a circle one could, for 
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Function Rejection Power 

jV = 10 iV = 100 iV = 1000 

Ai (Linear) 0.117 0.824 1.00 

A-2 (Wide Gaussian) 0.152 0.910 1.00 
As (Narrow Gaussian) 0.102 0.672 1.00 

Table I Rejection power for functions Ai, A2, and .4.3 
with a flat null hypothesis. 



example, construct Y as : 



Y, = 2tt [ ' f{X)dX (5) 



where the integer subscript i denotes the i*" data point 
and X_ is the lowest possible value of X. 




0.5 1 10 20 

raw K(N=100) K(N=1000) 



Figure 2: Determination of rejection power for a compound hypothesis: ensembles fitted for decay constant of 
exponentially decaying form, (top row) PDF matches fit parametrization: (left) Raw distribution, (center, right) 
distributions in of fitted, flattened experiments, N = 100 and N = 1000. (bottom row) PDF inconsistent with 
parametrization: (left) Raw distribution, (center, right) distributions in Ki of fitted, flattened experiments, N = 100 
and N = 1000. 



4. Compound hypotheses 

The examples considered thus far have been ones 

where no parameter fitting has occurred. While this 
has boon an instructive exercise, it has limited ap- 
plication, as most measurements in particle physics 
involve the fitting of measured distributions to deter- 
mine shapes and to derive some physics quantity or 
conclusion. Wc now look at compound hypotheses. 

In evaluating rejection of alternative hypotheses 
via toy MC in the compound case, it is important 
that the fitting process be integrated into the evalu- 
ation procedure. Consider a data set {(pi} where the 



PDF is assumed to be parametrizablc as /((^; a) and 
the unbinned likelihood is maximum for cx — o^max • 
The data are then flattened assuming the PDF is 
f {4>'t CKmax) , and the associated Ki is evaluated. The 
confidence level of this Ki value may then be found 
by referencing the ensemble distribution of Ki when 
the true PDF is f{4>;o.max), and each experiment of 
the ensemble is treated as data, fitted and flattened 
according to the fit. 

This procedure was used to evaluate rejection power 
for pairs of similarly shaped PDF's. Here we show one 
such result, for the hypothesis nn{a)e^^^'^'^ , where 714 
is a normalization constant, the measured quantity 
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is X, and experiments are fitted for a. Tlie alter- 
native PDF was the linear form f{X) = 2(1 — X). 
Experiments were generated according to the alterna- 
tive PDF (A), and each was fitted to the hypothesis. 
The mean maximum likelihood value of a was approx- 
imately 4.7. Ensembles (B) were generated according 
to the hypothesis, with a = 4.7, and fitted in the same 
way. The 5% confidence criterion on Ki for (B) and 
acceptance of this criterion for (A) were estimated by 
counting (Figure The rejection powers were found 
to be 28% and 99% for N = 100 and N = 1000, re- 
spectively. For comparison we also calculated by the 
same procedure the rejection of the test, using 20 
bins in the interval [0,1] and found powers of 13% and 
100%, respectively. 




alpha-vs-K1 alpha-vs-KI 



Figure 3: Scatter plots of fitted parameter amax vs. Ki 
for ensembles shown in Figure |5| (A'^ = 100). 

We also examined the two-dimensional distribution 
of fitted amax values vs. Ki. Any dependence of the 
test on the fitted rather than underlying parameter 
value reduces its utility as a goodness-of-fit test; for 
example, the maximum likelihood value, Cmax, is not 
usable as a goodness-of-fit statistic because it depends 
strongly on the fitted parameter value(s) amax ~ for 
a certain class of fitting functions, the correlation is 
100% 0. Figure 131 shows scatter plots of a^ax and 



Ki, where the data were generated with N = 100 
and the generated and fitted forms are those from the 
example of Figure [3 There appears to be no strong 
dependence. 

In any determination of rejection power with a com- 
pound hypothesis, it is necessary to determine the dis- 
tribution of Ki for the correct hypothesis. It does not 
appear that there is a simple ansatz as in the case of 
binned least squares fitting, where the chisquare con- 
verges to a chisquare distribution with the number of 
degrees of freedom reduced by one unit for each linear 
fitted parameter. We study this question empirically 
by generating MC ensembles for a variety of shapes. 
Each ensemble was generated according to the fitted 
functional form with parameter value (s) fixed. Each 
experiment was fitted with parameter(s) fioating, and 
the Ki value was obtained from the data fiattened ac- 
cording to the best fit. The distribution of resultant 
Ki values for each ensemble was fitted for the decay 
constant, assuming an exponentially decaying form. 
Ensembles with TV = 10, iV = 100, and N = 1000 
were generated. The results are summarized in Ta- 
ble0] There are several notable features. First, while 
all of the Ki distributions had a decaying form, as one 
might expect, and a fit that converged, not all yielded 
good fits; the exponential form is not preserved un- 
der compound hypotheses. Secondly, all inverse decay 
constants are greater than unity, indicating that the 
Ki distribution moves toward zero with fitting. This 
is not suprising; fitting identifies for each experiment 
the shape that is "closest" to the data, giving in gen- 
eral a better goodness-of-fit than the generator shape. 
Finally, there is no obvious pattern in the value of 
the decay constant with number of fioated parame- 
ters. However, it is seen that for a given PDF and set 
of fitted parameters, the shape of the Ki distribution 
shows remarkably little change as N is changed by two 
orders of magnitude. 



Form Generated Fitted Ki (Decay Constant) ^ (x^ /ndf) 

N = 10 N = 100 = 1000 

{l-a)+a(2X) a = 0.7 a - - 1.28 ± 0.07 (70/67) 

(1 - a) +a[n2e-'^''(-^-°-5)'] a = 0.3 a - 1.90 ± 0.06 (230/80) 1.94 ± 0.09 (223/65) 

(l-Q)+a[n3e~25®(^-°-5'''] a = 0.2 a - 1.56 ± 0.05 (203/82) 1.56 ± 0.07 (82/68) 

^^g-iox/a a =1.0 a 1.23 ± 0.01 (147/133) 1.28 ± 0.04 (68/85) 1.28 ± 0.06 (75/76) 

„5e-[x-(o.5+a,2)]2/2{ai/8)2 ai = 1.0, Qi 1.36 ± 0.01 (176/131) 1.38 ± 0.05 (93/85) 1.50 ± 0.07 (56/65) 

aa = Q2 1.22 ± 0.01 (154/135) 1.25 ± 0.04 (122/96) 1.28 ± 0.06 (73/72) 
Qi,Q2 1.84 ±0.019 (148/90) 2.00 ± 0.065 (53/59) 2.13 ±0.095 (47/47) 

Table II Inverse decay constants of Ki distribution for several generated forms, flattened after fitting for parameter(s) 
{o!i}. The Hi are normalization constants, which may depend on the parameters Uj. No entry is made for samples 
where low statistics resulted in best fits which were at the limits of the parametrization. 
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5. Extension to multidimensional data: 
speculation 

Our goal in this investigation has been to arrive at 
a multidimensional unbinned goodness-of-fit test, one 
that has rejection power in all dimensions, not just 
in one- dimensional projections, for multidimensional 
data. Many unbinned tests depend on the integrated 
sum of or spacings between neighboring data points, 
quantities which are not well-defined when extended 
to more than one dimension. Although the Ki statis- 
tic does not have this property, it is yet to be deter- 
mined whether there exists an extension that is fully 
multidimensional; for example, in two dimensions, two 
components each mapped to a circle corresponds to a 
data space that is the surface of a toroid, for which 
there is no obvious nontrivial vector sum that maps to 
the Random Walk. A fully general extension to multi- 
dimensional data will additionally require a flattening 
algorithm and provisions for data spaces of arbitrary 
shape. We will continue to explore the possibilities for 
extending Ki for use with multidimensional data. 

6. Summary 

We have explored an unbinned goodness-of-fit test 
for data in one dimension that is based on the map- 
ping of flattened distributions to a two-dimensional 
random walk. This method is truly binning-free and 
scale-independent, and the ensemble distribution for 
the null hypothesis is well-defined. For a compound 
hypothesis we specify a procedure to determine the 
ensemble distribution of the test statistic via Monte 
Carlo so that rejection power may be readily deter- 
mined. The distribution is found for several different 
parametrized forms and shown to be largely indepen- 



dent of statistics. We examine several samples for de- 
pendence between the test statistic and fitted param- 
eter values, and find no evidence of any. The rejection 
power for alternate hypotheses is demonstrated for a 
few examples and is found to be comparable to that 
of the chisquare method. 
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